Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to terminate unfinished requests after graceful shutdown #5941

Merged
merged 12 commits into from
Dec 11, 2024

Conversation

ikhoon
Copy link
Contributor

@ikhoon ikhoon commented Oct 16, 2024

Motivation:

Unfinished requests even after graceful shutdown period are forcibly closed with ClosedSessionException. As ClosedSessionException indicates that the connection was unexpectedly disconnected, ClosedSessionException is not suitable for graceful shutdown.

In this PR, I propose to add ShuttingDownException to terminate unfinished requests when a server stops.

Modifications:

  • Introduce GracefulShutdown to customize graceful shutdown behavior.
    • Users can specify a error function to create an exception to unfinished terminate requests.
  • Fixed HttpServerHandler to send error responses using the error function of GracefulShutdown
  • Fixed Server to send error respones first and then close the connnections.
  • Deprecation) ServerConfig.gracefulShutdownQuietPeriod() and ServerConfig.gracefulShutdownTimeout() have been deprecated in favor of ServerConfig.gracefulShutdown().

Result:

You can now use GracefulShutdown to terminate unfinished requests when a server stops.

GracefulShutdown gracefulShutdown =
  GracefulShutdown
    .builder()
    .quietPeriod(Duration.ofSeconds(10))
    .timeout(Duration.ofSeconds(15))
    .shutdownErrorFunction((ctx, req) -> {
        return new ServerStopException();
    })
    .build();

Server
  .builder()
  .gracefulShutdown(gracefulShutdown);

Motivation:

Unfinished requests even after graceful shutdown period are forcivily
closed with `ClosedSessionException`. As `ClosedSessionException`
indicates that the connection was unexpectedly disconnected,
`ClosedSessionException` is not suitable for graceful shutdown.

In this PR, I propose to add `ShuttingDownException` to terminate
unfinished requests when a server stops.

Modifications:

- Introduce `GracefulShutdown` to customize graceful shutdown behavior.
  - Users can specify a error function to create an exception to
    unfinished terminate requests.
- Fixed `HttpServerHandler` to send error responses using the error
  function of `GracefulShutdown`
- Fixed `Server` to send error respones first and then close the
  connnections.
- Deprecation) `ServerConfig.gracefulShutdownQuietPeriod()` and
  `ServerConfig.gracefulShutdownTimeout()` have been deprecated in favor
  of `ServerConfig.gracefulShutdown()`.

Result:

You can now use `GracefulShutdown` to terminate unfinished requests when
a server stops.
```java
GracefulShutdown gracefulShutdown =
  GracefulShutdown
    .builder()
    .quietPeriod(Duration.ofSeconds(10))
    .timeout(Duration.ofSeconds(15))
    .shutdownErrorFunction((ctx, req) -> {
        return new ServerStopException();
    })
    .build();

Server
  .builder()
  .gracefulShutdown(gracefulShutdown);
```
@ikhoon ikhoon added this to the 1.31.0 milestone Oct 16, 2024
@ikhoon ikhoon marked this pull request as ready for review October 23, 2024 02:52
Comment on lines +47 to +58
/**
* Returns the quiet period to wait for active requests to go end before shutting down.
* {@link Duration#ZERO} means the server will stop right away without waiting.
*/
Duration quietPeriod();

/**
* Returns the amount of time to wait before shutting down the server regardless of active requests.
* This should be set to a time greater than {@code quietPeriod} to ensure the server shuts down even
* if there is a stuck request.
*/
Duration timeout();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this even more flexible rather than just letting a user configure the quiet period and timeout? For example:

interface GracefulShutdown {
    ...

    // Armeria core passes its GracefulShutdownHandler to this method.
    void startGracefulShutdown(GracefulShutdownHandler handler);
}

// A GracefulShutdown implementation calls back Armeria core via this handler.
interface GracefulShutdownHandler {
    void gracefulShutdownStarted(...);
    void quietPeriodComplete(...);
    void gracefulShutdownComplete(...);
    @Nullable
    Throwable toException(ctx, req); 
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about the design of GracefulShutdownHandler, but I still don't have a good idea yet of what parameters should be set in its methods.

Should we introduce GracefulShutdownHandler when there is a requirement to customize the low-level graceful shutdown logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! No problem.

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, left a minor question on thread safety when closing connections

for (Channel ch : children) {
final HttpServerHandler serverHandler = ch.pipeline().get(HttpServerHandler.class);
if (serverHandler != null) {
closeFutures.add(serverHandler.shutdown(ch));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called from the startStopExecutor, which breaks the assumption that HttpServerHandler#cleanup is called from the event loop assigned to the channel.

Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, it was my mistake.

if (bossGroups.isEmpty()) {
finishDoStop(future);
return;
shutdownServerHandlers().handle((unused3, unused4) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood now, that requests are closed with an exception once before closing the socket directly.

As a result, users will see more 503 responses instead of connection resets

.map(DecodedHttpRequest::whenResponseSent)
.toArray(CompletableFuture[]::new);
CompletableFuture.allOf(futures).handle((unused0, unused1) -> {
completionFuture.complete(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: checked that the order of completing the future vs. close the encoder doesn't really matter as 1) pipeline requests are aborted above 2) the timing of closing the keepAliveHandler doesn't really matter

@ikhoon ikhoon modified the milestones: 1.31.0, 1.32.0 Nov 7, 2024
@ikhoon
Copy link
Contributor Author

ikhoon commented Nov 7, 2024

Changed the target milestone to 1.32.0. I need some time to figure out how to implement the API that Trustin suggested.

@ikhoon ikhoon marked this pull request as draft November 22, 2024 05:42
@ikhoon ikhoon marked this pull request as ready for review December 5, 2024 10:04
Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. 👍
Left some minor comments.

* Builds a new {@link GracefulShutdown} with the configured parameters.
*/
public GracefulShutdown build() {
validateGreaterThanOrEqual(timeout, "timeout", quietPeriod, "quietPeriod");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving validateGreaterThanOrEqual into this class?

// The future returned by shutdown() will be always completed successfully.
final CompletableFuture<List<Void>> combined = CompletableFutures.allAsList(closeFutures);
config.workerGroup().schedule(() -> {
combined.complete(ImmutableList.of());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't wait all closeFutures to be completed if completing closeFutures takes more than 1 second.

Copy link
Contributor Author

@ikhoon ikhoon Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intentional. A graceful timeout has passed since the stop process started. 1 second is the last chance/time to send shutting-down responses before the connections are closed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was intentional, let's not call CompletableFutures.allAsList(closeFutures); because it's useless. We can do:

final CompletableFuture<Void> future = new CompletableFuture<>();
config.workerGroup().schedule(() -> {
    future.complete();
}, 1, TimeUnit.SECONDS);
return future;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, NVM. It is still needed because the future can be completed earlier.

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ikhoon ikhoon merged commit 468aec1 into line:main Dec 11, 2024
13 of 14 checks passed
@ikhoon ikhoon deleted the shutting-down-exception branch December 11, 2024 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants